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Abstract 

The paper reports on a computer algebra pro- 
gram LSSS (Linear Selective Systems Solver) 
for solving linear algebraic systems with ratio- 
nal coefficients. The program is especially effi- 
cient for very large sparse systems that have a 
solution in which many variables take the value 
zero. The program is applied to the symmetry 
investigation of a non-abelian Laurent ODE in- 
troduced recently by M. Kontsevich. The com- 
puted symmetries confirmed that a Lax pair 
found for this system earlier generates all first 
integrals of degree at least up to 14. 

1 Introduction 

In mathematics, science and engineering many 
problems lead to sparse linear algebraic sys- 
tems. For example, in a discretization of a 
smooth object the relations between variables 
defined in neighbouring points typically involve 
only a small number of variables that is depen- 
dent on the dimension of the object which is 
low. 

For sparse linear systems there are computer 
algebra programs available, for example, the 



code of Roman Pearce [T5] that has been incor- 
porated into MAPLE 14: SolveTools : -Linear. 
It is automatically applied if a sparse system is 
to be solved. 

The typical concern of solvers for sparse sys- 
tems is to avoid a choice of pivot that increases 
the number of variables in equations and thus 
leads to non-sparse equations during the solu- 
tion process. For many problems this can not 
be avoided, only delayed. For example, when 
solving the sparse linear system resulting from 
the discretization of the (partial differential) 
Laplace equation (i.e. heat equation in the sta- 
tionary limit) it is clear that the temperature at 
each single point depends on the temperature at 
all boundary points and that the temperature 
inside is typically non-zero|]] 

The linear systems we studied are also sparse 
but different in nature. The value of most of 
their variables in the solution is exactly zero. 
Such systems play the role of selectors and are 
therefore called "selection systems" in this pa- 
per, because from a large number of monomials, 
all with a constant undetermined coefficient, 



^ven if the temperature on the whole boundary is 
zero except on an £-sized part where it is positive, the 
temperature at all inner points will be positive. 
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some linear combinations of the monomials are 
selected which satisfy some condition and the 
other monomials are dropped by setting their 
coefficient to zero. This is very different from 
linear sparse systems like those resulting from 
the discretization of the Laplace equation. Ta- 
ble [T] compares both types of problems. 

Selection systems have a number of useful 
properties. 

• The existence of zero-valued variables al- 
lows an effective solution as performed by 
the program LSSS, described in this pa- 
per. LSSS is running under the computer 
algebra system Reduce ([5]). 

• From the application it is clear whether 
a linear system belongs to this class, i.e. 
whether many variables take the value zero 
and thus a specialized computer program 
should be applied. 

• The efficiency of solving selection systems 
increases with the complexity of the prob- 
lem. For example, the higher the degree of 
the polynomial ansatz in a symmetry inves- 
tigation - the larger is the size of the linear 
system for the unknown coefficients but the 
higher is also the percentage of zero- valued 
coefficients and the higher is the efficiency 
of the solution technique that uses zero- 
valued variables (see section |3J). 

• Not only can such linear systems be solved 
efficiently, they can also be formulated 
more economically. More precisely, it is 
possible to formulate much smaller equiv- 
alent systems as shown in section 13.71 

The strategy to have many different meth- 
ods available to solve a system of equations 
and to give those methods the highest priority 
which make most progress and are least risky to 
lead to an explosion of size is not new. In the 
Maple command solve this is used since 1985 
even with the heuristic of substituting zero vari- 
ables first (see [2j) and in the Reduce package 
Crack such strategies including substitution of 
variables by zero as highest priority are applied 
to the solution of overdetermined algebraic and 



differential systems ([13]. What is new in this 
paper is the realization that if variables take 
the value of zero in linear systems, then usually 
many variables vanish and then very fast rou- 
tines for identifying 1-term equations, for drop- 
ping such variables from equations and even for 
avoiding the construction of equations pays of. 

What justifies the creation of special purpose 
programs for selection systems is the fact that 
these systems occur frequently, especially in in- 
tegrability investigations of differential equa- 
tions when computing infinitesimal symmetries, 
first integrals, conservation laws, variational 
principles or other qualitative properties which 
may but need not exist and for which an ansatz 
can be formulated. 

The following section describes an applica- 
tion that leads to a series of sparse systems with 
a high percentage of zero- valued variables. This 
application was the starting point for the devel- 
opment of LSSS. Readers interested essentially 
in the computational aspects of formulating and 
solving the systems may proceed directly to sec- 
tion |3J which describes methods for formulating 
and solving selection systems efficiently. 

The times for solving the linear systems re- 
sulting in the applications are shown in section 
IH The subsequent section \5\ discusses aspects 
of the computer algebra system Reduce which 
become important in large computations. Sec- 
tions [6] and [7] report on tests of other computer 
algebra systems. The paper concludes with a 
list of available procedures in section |3J and a 
summary. 

2 The application 

2.1 A non-abelian ODE-system 

In the programme of M. Kontsevich of creating 
a proper non-commutative algebraic geometry 
inspired by modern quantum field-theoretic re- 
quirements and challenges, he proposed a non- 
commutative version of symplectic geometry in 
which integrable non-abelian ODE-systems ap- 
pear naturally (PQ). 

More specifically, he considered the discrete 
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Table 1: Characterization of two different types of sparse linear systems 



map 

u — > uvuT x , v — > u^ 1 + v^u^ 1 (1) 

([2]) and the following non-abelian ODE system 
which is invariant under ([1]): 

u t = uv — uv^ 1 — v~ , v t = —vu + vu^ 1 + vT 1 

(2) 

where u, v are non-commutative variables (in 
particular, square matrices of arbitrary size). In 
comparison to non-abelian ODE systems with 
polynomial right hand sides investigated first 
in [3] and later in [I], [3] the system (J2J) in- 
volves Laurent polynomials, i.e. polynomials in 
u,v and inverses m -1 ,i> -1 . 

Based on numerical experiments Kontsevich 
conjectured that (j2D is integrable itself. This 
was demonstrated recently in [6] where a Lax 
pair has been found which also proves integra- 
bility of (JTJ (see section 2 of [2]). In addi- 
tion a pre-Hamiltonian operator is given in [6] 
that maps gradients of trace first integrals to 
infinitesimal symmetries. The existence of in- 
finitely many symmetries and even better of a 
Lax pair is taken as an indicator of integrabil- 
ity of an ODE or PDE system (|7j). To verify 
that the Lax pair produces all Laurent polyno- 
mial first integrals our strategy is to compute by 
brute force all Laurent polynomial infinitesimal 
symmetries up to some degree and to compare 



them with the symmetries produced from the 
Lax pair. This paper reports on the computer 
algebra program LSSS that was developed to 
solve the linear algebraic systems resulting in 
the computation of all such symmetries up to 
degree 16. 

2.2 Symmetries 

For a given system of equations 

u t = Pi(w,t>,ii _1 , v _1 ), v t = P 2 {u,v,u~ 1 ,v~ 1 ) 

(3) 

where P\,P2 are polynomials in u,v,u 1 ,v 1 
a Laurent polynomial (infinitesimal) symmetry 
is defined through two polynomials Q±,Q2 in 
u, v, u' 1 , v _1 such that the flow 

Ut = Ql(u, V, U" 1 , V' 1 ), V T = Q 2 (U, V, U^ 1 , iT 1 ) 

(4) 

commutes with the system, i.e. 

D T D t u = D t D T u, (5a) 

D T D t v = D t D T v. (5b) 

The vector {Qx^Q^f is called generator of the 
symmetry. 

In this paper the system (E]) is given through 
(j2J). The polynomials Q±,Q 2 are generated by 
a special purpose procedure that creates the 
most general (inhomogeneous) polynomials up 
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to some given degree in the non-abelian vari- 
ables u, v, vT 1 , v^ 1 with symbolic (abelian) co- 
efficients Cj. The only condition satisfied by 
Q11Q2 is that and t>,i> _1 are not stand- 

ing next to each other anywhere in any term as 
they would cancel. 

The symmetry conditions fl5J) have to be sat- 
isfied identically for any u, v. That means, after 
(j5J) is formulated, the coefficients of all different 
products of powers of -u, f, -u -1 , i> _1 have to be 
set to zero generating a linear system for the 
unknown coefficients in the symmetry genera- 
torsdlj. This process is called 'splitting' in the 
remainder of the paper. 

The number t n of terms of each Qi of de- 
gree n (which is one half of the number of un- 
known coefficients) satisfies the recursive rela- 
tion t n = 3t n _i + 2 with to = 1 because apart 
from the coefficients c$ the polynomial of de- 
gree has only the term 1 and the most general 
polynomial of degree n is obtained by multiply- 
ing each term of the most general polynomial 
of degree n — 1 from one side, say from the 
left, with all possible three of the four factors 
u, v,u~ , v~ l which do not give a cancellation. 
One exception is the term 1 that is multiplied 
with each one of the 4 factors i.e. from this term 
results one extra term and the extra term 1 is 
added giving t n = 3£ n _i + 1 + 1 = 3t n _i + 2. 
Thus the total number of terms t n occuring in 
Qi and Q2 of degree n satisfies t n = 3t n -i + 4. 



2.3 Necessary symmetry condi- 
tions 

Additional information available for our sym- 
metry computation results from a separate com- 
putation of full first integrals I satisfying D t I = 
where D t is the total time-derivative. The 
condition that a Laurent polynomial in u, v is 
a first integral is very restrictive because it im- 
plies that each of the mx m components of the 
matrix first integral is an integral of its own. 
Restrictive conditions lead to very overdeter- 
mined systems (here linear) which are easier to 
solve. Therefore it was possible to compute all 
first integrals up to degree 14 with the result 



that 



uvu v , 



r 1 



vuv u 1 (6) 



and their integer powers I k , k = —3, ..,3 are 
the only first integrals up to degree 14. 

From the symmetry conditions flSJ) and D t I = 
follows 

D t D T I = D T D t I = 

and further from I k being the only first integrals 
(up to degree 14) 



fco 

D T I= J2 a kl\ 

fc=— fco 



(7a) 



and similarly 



D T I 



-1 



fco 

fc= — fco 



(7b) 



for sufficiently high ko, in our case ko = 3 be- 
cause / is of degree 4. These conditions involve 
a few more extra unknown constants ak, bk but 



adding f!7bj) is beneficial because these are first 
order conditions involving fewer terms than the 
full symmetry conditions (JSJ) which are of sec- 
ond order. 



3 The solution of selection 
systems 

The linear systems resulting from splitting the 
necessary and sufficient conditions (J5J) and the 
additional necessary conditions ([7]) cover a wide 
range of sizes as shown in table [2j As dis- 
cussed in section 12.21 the numbers k n of un- 
knowns for a symmetry ansatz of degree n grow 
like k n+ i = 3k n + 4 which is also the growth rate 
of the number e\ of equations in the necessary 
condition D T I = 00 

As the table indicates, the linear systems 
have a few characteristic properties that need 
to be exploited in order to compute high degree 
symmetries. These properties are: 



2 The vanishing of constants ak in ([7a| and bk in (|Tb[) 
follows when formulating these conditions. 
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Table 2: For each symmetry ansatz of degree n = 3. .16 are listed the numbers: k of variables, 
e 1 of equations and t x of terms of system D T I = 0, e 2 of equations and t 2 of terms of system 
[D t , D T ](u, v) = and p of free parameters of the solution. 

• Overdetermination: There are 4.5 to 5.5 
times as many equations as unknowns. 

• Sparsity: Equations have on average 4 to 5 
terms. 

• Zero-valued Variables: Most of the vari- 
ables take an exact value of zero in the 
solution. Even more importantly, the per- 
centage of zero-valued variables is increas- 
ing as the degree of the ansatz increases, 
i.e. as the system to be solved increases in 
size. For systems resulting from symme- 
tries of degree 4 this percentage is 93.2% 
and for symmetries of degree 16 the per- 
centage is 99.966% . Nevertheless, the so- 
lution for degree 16 is not trivial. It has 
58118 non-vanishing variables and 32 free 
parameters. 

3.1 Ideas for a solver of large 
sparse linear algebraic sys- 
tems 

The following ideas take advantage of the fea- Any extra low level procedures that were writ- 
tures of selection systems as listed above and ten to implement these concepts assume that 



describe the stages of development of the com- 
puter algebra program LSSS. These stages have 
been to: 

• start with a program for solving a stream 
of equations, 

• sort equations initially by size, shortest 
first, 

• apply 1-term equations very efficiently and 
apply them before any other equation, 

• be able to generate only 1-term equations 
before formulating the whole system, 

• iterate between the exclusive formulation 
and the application of 1-term equations, 

• be able to generate and take advantage of 
extra necessary conditions before working 
on the original system, and 

• choose from possibly different options to 
generate 1-term equations the most effi- 
cient one. 



5 



systems are linear in the unknowns. This al- 
lows them to be more efficient than universal 
routines. 

In the following subsections more detailed 
comments are made to each of the measures. 

3.2 A stream of equations 

In earlier work [12] large and very overdeter- 
mined polynomial systems were solved which 
had too many equations (in the order of 10 14 ) 
to be even formulated initially. A linear alge- 
braic system solver was developed at the time 
for related problems dealing with the system of 
equations as an incoming stream of data not to 
be stored in RAM memory, only to be processed 
equation by equation. A strength of this algo- 
rithm is that any limitation on available RAM 
memory poses only restrictions on the size of 
the preliminary solution of the system and thus 
only on the number of variables, not on the 
number of equations. 

At the start of this program a preliminary 
solution is initialized as an empty list. Then 
with each incoming equation the following steps 
are performed. 

• The equation is simplified modulo a pre- 
liminary solution stored in the form of a 
substitution list g>j = fi(gj) where g^ are 
the unknowns and /j are linear expressions 
in the unknowns except g^ from any of the 
left hand sides. 

• If the resulting simplified equation is not 
an identity then it is solved for one of the 

• substitutions g m = f m (gj) are performed 
in all fi of the preliminary solution, and 

• g-m = fm(gj) is appended to the substitu- 
tion list. 

In the computation of high order symmetries 
this procedure is applied to the remaining sys- 
tem after all 1-term equations have been de- 
termined and used. The remaining system is 
small enough that it does not have to be stored 
on hard disk and then read again from disk. 



3.3 Sorting equations 

A first speedup is obtained by sorting equa- 
tions according to size, shortest first, before pro- 
cessing them as described above leading to the 
times shown in column (B) of table HI For ex- 
ample, for symmetries of degree 9 the speed up 
is a factor of 4. 

Processing shorter equations earlier means 
that the substitution list involves shorter right 
hand sides and reduces incoming equations to 
shorter size earlier which adds shorter new sub- 
stitution rules to the preliminary solution, i.e. 
it is a self-amplifying increase of efficiency. 

In symbolic mode of Reduce the sorting of 
equations can be done very effectively, for ex- 
ample, by establishing a list L = {li, I2, h, ...} 
where U is a list of (pointers to) all equations 
with i terms and then linking these lists. As- 
signing an equation to a list 1^ is done efficiently 
by having 2 pointers, one stepping from one 
term to the next in the equation and the other 
at the same time stepping from U to U + \. When 
the first pointer reached the end of the equation, 
the other pointer gives the list U under which 
the equation is then listed. 

3.4 Applying 1-term equations 

Sorting equations is beneficial if equations vary 
much in size. This is especially the case for 
selection systems with many equations having 
only one term. 

But not all the variables that take zero value 
in the solution need to appear in 1-term equa- 
tions in the original system. Many 1-term equa- 
tions may result only after the first 1-term equa- 
tions are applied. Still, during the solution pro- 
cess many 1-term equations are generated which 
justifies their special treatment. 

What makes 1-term equations special is the 
fact that replacing a variable by zero in a poly- 
nomial (or linear) expression can be done very 
fast without re-writing the expression, at least 
in a lisp-representation: the pointer from the 
previous term to the term that vanishes is sim- 
ply re-directed to the next term. Also, no 
re-simplification is necessary. In contrast, al- 
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ready the application of 2-term equations re- 
quires afterwards simplifications as the term 
resulting from substitution may combine with 
other terms. 

To set many variables in an expression to zero 
in a most efficient way, the value cell of the vari- 
able was set to nil. Testing a variable for a zero 
value can then be done by just testing whether 
the variable is bound. After this is done for all 
known vanishing variables, the (large linear) ex- 
pression in these variables is pruned only once 
by the special low level routine PruneZeros. 
The pruning is done 'in place' avoiding copy- 
ing of the whole expression and also reducing 
future garbage collections. 

All efficiency improving measures introduced 
in the sections 13. 2\ 13. 3^ 13.41 are applied in the 
procedure LSSS [TU] but can also be performed 
alone on given expressions. 

In the following subsections techniques are 
described that make the formulation of the lin- 
ear system more efficient, or avoid the formula- 
tion of large selection systems. 

3.5 Selective splitting of equa- 
tions 

The following steps are all computationally ex- 
pensive: 

• the formulation of two large symmetry con- 
ditions jS]), 

• their separation (called 'splitting' in the 
following) into two large linear systems 
with many redundant equations by setting 
coefficients of different products of powers 
of u, v separately to zero, and 

• millions of simplifications of the large linear 
systems due to millions of vanishing vari- 
ables. 

The idea is to avoid most of these computations 
by 

• extracting selectively only 1-term equa- 
tions from the symmetry conditions (J5]), 
and 



• using them to simplify the symmetry con- 
ditions themselves, and 

• to repeat this procedure as long as 1-term 
equations can be found (see section |3TT|) . 

Finally, the much smaller symmetry conditions 
are completely split and the resulting linear sys- 
tem is solved using LSSS as described in the 
previous three subsections. 

The only remaining large step is the initial 
formulation of the symmetry conditions. One 
can save half of that computation by formu- 
lating only one symmetry condition, extracting 
from it and applying to it repeatedly 1-term 
equations, then pruning the ansatz for the sym- 
metry (j3|) before computing the second symme- 
try condition. 

Furthermore, even the formulation of the first 
symmetry condition can be postponed if short 
additional necessary conditions are known as 
described in the next subsection. From them 1- 
term equations, i.e. vanishing variables can be 
extracted and the ansatz for the symmetry (J3J) 
be pruned before any new symmetry condition 
is formulated. 

The interplay between formulating and solv- 
ing equations may be the only way to approach 
a large problem which otherwise could even not 
be formulated. For example, this iterative ap- 
proach was necessary to compute symmetries of 
degree 14 on the available computer hardware 
with 128 GB memory under PSL Reduce. The 
same approach was also applied in [12J to solve 
a non-linear system which was too large to be 
formulated at once. 

The key to be able to selectively split up 1- 
term equations is a low level procedure written 
in Symbolic Mode of Reduce that takes 
only two lines. It recursively steps through an 
expression and identifies 1-term coefficients of 
any monomial in u, v, u^ 1 , v^ 1 . If such a coeffi- 
cient is found, it directly sets the value cell of 
the variable to nil (see section 13.41 for the case 
that a system of linear equations is given from 
which 1-term equations are picked). 
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3.6 Utilizing additional neces- 
sary conditions 

Apart from a system of equations to be solved 
sometimes additional necessary conditions are 
available. The question is whether these condi- 
tions can be utilized to speed up the solution of 
the original system. If a program is performing 
its computation with the whole system that is 
to be solved at once then solving in addition 
extra conditions results in an increase of the to- 
tal computation time. On the other hand, if the 
program is able to extract information from the 
system selectively then having extra necessary 
equations available means that the program has 
more options and may be able to solve the com- 
bined system faster than the original system 
alone. 

If a program is able to generate repeatedly 
1-term equations and to apply them then it is 
beneficial for the program to have extra nec- 
essary conditions (!7a|) . Even if such opportu- 
nities do not exist, extra necessary conditions 
can already be useful if the length of equations 
varies much and if all equations are sorted by 
size and if system plus extra necessary condi- 
tions are very overdetermined. 



3.7 Optimizing iterations 

An optimal strategy to speed up computation 
is not to find and use as many as possible van- 
ishing variables but to find and use as many 
as possible per time. This means one wants to 
find an optimum between formulating new con- 
ditions ([7]) , fl5]) which each provide many vanish- 
ing variables but which take considerable time 
to formulate, or to utilize already formulated 
conditions by extracting again vanishing vari- 
ables, utilizing them, extracting more, and so 
on. The second process is faster than formulat- 
ing new conditions but the number of vanishing 
variables that are found is gradually decreasing 
as shown in table [3j The following comments 
refer to the computation of symmetries of de- 



gree 13|f| 

After extracting vanishing variables from the 
necessary condition ( !7a| one has the options to 

• prune D T (u,v) (8 sec) and re-formulate 
f!7a|) (39 sec), or 

• prune D T (u,v) (8 sec) and formulate the 
other necessary condition (l7b|) (39 sec) 

• prune the already formulated condition 
(ITajl (37.3 sec) 

before extracting vanishing variables from that 
equation which was just formulated or pruned. 
Because in all 3 cases about 1,535,270 new van- 
ishing variables are found, the third option is 
the fastest and most efficient one which is there- 
fore repeated several times. As shown on table 
[3] the number of vanishing variables that are 
found, shrinks and the time to scan the condi- 
tion stays constant, thus it pays off after three 
runs to invest in the formulation of a new con- 
dition 

u Tt = u tT . (8) 

After a first selective split of (|8]) it is most ef- 
fective to prune and selectively split (!7a|) again, 
even twice before continuing to prune and split 
selectively It turns out that formulating in 
addition the second symmetry condition 

Vrt = V tT (9) 

only for the purpose of selective splitting does 
not result in additional 1-term equations and is 
therefore not beneficial. 

To summarize, denoting the pruning and se- 
lective splitting of necessary condition (I7a|) by 
n and the pruning and selective splitting of 
symmetry condition ([8]) by s, the sequence 
n 3 (snn) 4 (sn) 4 followed by a formulation of the 
symmetry condition (jSJ), a complete splitting of 
flSJ) and complete splitting of what is left of (jSJ) 
and a call of LSSS to solve the linear system 
brings down the total time for formulating and 
solving symmetry conditions for degree 13 to 

3 Given times refer to the computation of degree 13 
symmetries on a single CPU of a 8 core node (2 sockets 
x 4 cores per socket) Xeon @ 2.93 GHz using 48 GB 
memory of the node. 
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Table 3: Times and results for extracting 1-term equations repeatedly in the computation of 
symmetries of degree 13. t\. time to formulate (!7a|) in run 1 and to prune it in later runs, t2~. 
time to 1-term-split ( !7a|) . new: number of new vanishing variables found in each run. 



467 sec, compared to 3615 sec when straight- 
forwardly formulating and splitting OH]), fll]) and 
solving them with LSSS. 

An additional benefit of utilizing 1-term 
equations consists in a drastic reduction of 
memory needs to only 13 GB for PSL Reduce 
or 7 GB for CSL Reduce (see section |5] about 
the differences between PSL and CSL Reduce) 
whereas the ad hoc formulation of the large 
symmetry conditions OH]), © requires 120 GB 
in PSL Reduce or 60 GB in CSL Reduce for 
degree 13 symmetries. Without selective split- 
ting of 1-term equations, i.e. determination of 
vanishing variables, it would not have been pos- 
sible to determine symmetries of degree 15 on 
the computer with 128 GB and of degree 16 on 
a computer with 256 GB. 

3.8 Applying Solutions of large 
linear Systems 

In applications the solution of a system of equa- 
tions has usually to be substituted in expres- 
sions that are the main interest of the appli- 
cation. But millions of variables to be sub- 
stituted in expressions with millions of terms 
is expensive. In such situations the labelling 
of zero-valued variables and the pruning of 
large expressions containing them becomes use- 
ful again. 

In table HI column (D) subst. shows drasti- 
cally reduced substitution times of the solution 
into the symmetry ansatz fl4]) compared to col- 
umn (C) subst. After the computation in col- 
umn (D) no explicit substitutions are necessary. 
Any expression containing the unknowns needs 
only a) to be pruned to drop terms involving 



zero- valued variables and b) be simplified where 
non-zero variables get replaced by their com- 
puted value. 

4 Results 

The impact of measures described in the previ- 
ous section is shown in table HI The entries in 
columns with the header 'solve' give the time in 
sec to solve the linear algebraic system resulting 
from splitting the symmetry condition ([5]) (de- 
noted in the following by D[ ttT ](u,v) = 0) wrt. 
monomials in u, v , u -1 , v^ 1 . Columns with the 
header 'subst.' show the time to substitute the 
solution into the symmetry ansatz (|4])Q 

As described in the previous section, col- 
umn (A) was produced with the original stream 
solver p. Sorting equations by size results in 
a speedup of a factor up to 4 shown in column 

(B) and column (C) shows the times when 1- 
term equations are applied first as described 
in section 13.41 Computations in columns (A)- 

(C) use standard substitutions and in column 
(C) the explicit formulation of 1-term equa- 
tions and complete splittings of equations. The 

4 Columns (A)-(C) have been run on one Opteron 
2.2 GHz core of a 32 core node (8 sockets x 4 cores per 
socket) with 128 GB memory. In column (D) n = 3. .15 
were run on one Xeon 2.4 GHz core of a 16 core node 
with 128 GB memory and n — 16 was run on a single 
core of a machine equipped with 4 AMD Opteron pro- 
cessors each having 12 cores running at 2. 2. GHz and 
with 256 GB of main memory. Computation times 
turned out to be strongly dependent on the load on 
the remaining cores of the node and the other nodes of 
the cluster. The times in column (D) are conservative 
times which are always reproducible. Sometimes com- 
putations have been up to 30% faster than shown in 
column (D). 
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(B): 


(C): 1-term equ., 


(D) 




n 






sorting by size, 


sorting 


by size, 


all techniques 




stream solve 


stream solve 


stream solve 


of section [3] 




solve 


subst. 


solve 


subst. 


solve 


subst. 


solve 


subst. 


3 


.02 


.01 


.01 


.00 


.00 


.00 


.01 





4 


.14 


.01 


.15 


.02 


.02 


.02 


.03 





5 


1.6 


.17 


1.3 


.23 


.20 


.16 


.08 





6 


20.8 


1.46 


15.3 


2.4 


1.7 


1.3 


.26 





7 


206 


9.0 


85 


9.0 


16.3 


15.4 


.77 





8 


2,740 


82.5 


808 


92.8 


241 


223 


2.7 


.01 


9 


53,940 


1,190 


13,335 


1,482 


4,645 


2,675 


9.1 


.01 


10 










60,140 


20,360 


30 


.03 


11 














96 


.07 


12 














302 


.13 


13 














927 


.34 


14 














2284 


.42 


1 K 
10 














7587 


1.58 


16 














27970 


3.16 



Table 4: Times in sec for solving the symmetry conditions (|5]) for each degree n by different 
methods and for substituting the computed values into the symmetry ansatz (T4]). 



drastic improvement shown in column (D) be- 
came possible through direct detection of van- 
ishing variables (specialized partial splitting), 
repeated partial splitting, assigning nil to the 
value cell of a variable and thus avoiding substi- 
tutions and enabling fast pruning of expressions 
from zero variables. 

Table [5] compares four runs of the n — 13 
casejf) In all these computations the program 
LSSS is used to solve the linear algebraic sys- 
tems (the row 'complete solution of all remain- 
ing equations'). LSSS repeatedly applies 1- 
term equations and sorts the remaining equa- 
tions by size before starting the stream-solver. 
What table [5] shows is the benefit of generat- 
ing and solving the linear algebraic system in 
stages. With the availability of LSSS as an 
effective solver of large selection systems, the 
main cost shifted to the formulation of the lin- 
ear system. 



5 AU times in this table include CPU time and 
garbage collection time of CSL Reduce running on one 
of 16 cores (4 sockets x 4 cores per socket) Xeon 2.4 
GHz node with 128 GB memory. 



In the first column the whole system is for- 
mulated and solved at once. In the second col- 
umn at first Du T ]U = is formulated, then 
1-term equations are extracted and utilized to 
simplify the ansatz for D T u and D T v before 
Du t yv = is formulated and the complete sys- 
tem is solved. This provides a speedup of nearly 
1.7 for n = 13. 

The third column gives the times when the 
whole computation is done in 3 stages. In the 
first step the necessary condition (!7aj) : D T I = 
J2k°=-k ak ^ k * s formulated, and once vanishing 
variables are extracted which are used to sim- 
plify the symmetry ansatz (T4]) and speed up 
the formulation of the system. Although that 
first step is an additional computation not per- 
formed in the first two runs costing an extra 
238 sec, this is more than compensated after- 
wards by the speedup of computing and solving 
D[ tjT ] (u, v) = (also in two stages) leading to 
another overall speedup factor of about 1.7. 

The fourth column reports the times when at 
first a sequence of runs is performed where van- 
ishing variables are extracted and set to nil as 
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complete solution of all 
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o r* 

365 


125 
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92.7 


substitution of solution 










in D T (u, f) 


z.yo 


n qo 


U.oO 


n iq 
U. lo 


total time (rounded) 


3314 


1946 


1161 


924 



Table 5: Times in sec of the whole symmetry computation for n = 13 



explained in section 13.71 This leads to another 
modest overall speedup of 1.25 but the main 
advantage of the computation in the fourth col- 
umn is to lower the maximum memory require- 
ments that occur when the first half of symme- 
try conditions is formulated. 

The next section describes how differences 
between two versions of the computer algebra 
system Reduce become important for large 
computations. 

5 Reduce issues 

The open-source computer algebra system Re- 
duce can be run on two different LISP imple- 
mentations. They are PSL (Portable Standard 
Lisp) and CSL (Codemist Standard Lisp). 

The computations of this article require a 
very large number of identifiers. CSL has no 
restriction on the number of identifiers that 
can be used, whereas PSL is usually limited to 
65,000 identifiers. To run the symmetry com- 



putations reported in this paper in PSL, an ex- 
tended version of PSL Reduce had been devel- 
oped by Winfried Neun that allows for 20 Mio 
identifiers. This extended version of PSL can 
be downloaded for free 

When applied straightforwardly, PSL Re- 
duce typically runs somewhat faster than CSL 
Reduce^] However, CSL allows for compila- 
tion of critical routines into C. When this is 
done, CSL runs in most cases as fast as PSL or 
even faster. Compilation into C can be achieved 
by creating a package and making a profiling 
run followed by a rebuild of the CSL-Reduce 
system. 

A big advantage of CSL is its garbage collec- 
tor. CSL uses both a copying and a mark-sweep 
collector. Copying garbage collectors have little 
overhead but can give only half of the memory 
to the application as the other half is needed for 

6 Thc timings reflect the state of CSL/PSL at the 
time of writing the paper. The performance of both 
LISP systems is under active development. 
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n 


W 










i 

s solve 


s tools 


ns solve 


ns tools 


3 


.0036 


.024 


.035 


.027 


4 


.11 


.089 


.11 


.085 


5 


.38 


.20 


.3 


.23 


6 


1.1 


.86 


1.2 


.80 


7 


4.3 


2.97 


4.9 


3.75 


8 


20.4 


13 


24 


16.9 


9 


128 


73 


162 


81.3 


10 


792 


262 


929 


294 


11 


7560 


871 


8615 


972 




JJINr 


6oZ 1 


JJINr 


6161 


13 




11786 




11840 



Table 6: Maple 14 times 
n degree of the symmetry 

s solve: solving the system fl5]) with the solve command 

s tools: solving the system (j5]) with the SolveTools : -Linear command 

ns solve: solving both systems ([7]) and (111) with the solve command 

ns tools: solving both systems and (JHD with the SolveTools : -Linear command 

DNF: does not finish in one week 



copying. A mark-sweep garbage collector allows 
the use of all of the memory by the application. 
In CSL the copying collector is used as long 
as the memory pressure is low, but switches 
to mark-sweep when the requirement for more 
memory rises. This way, CSL leverages per- 
formance and available memory. This feature 
of CSL made it possible to ultimately compute 
symmetries of degree 15 with 115 GB of mem- 
ory and symmetries of degree 16 on a different 
machine with 256 GB. 

6 Comparison with Maple 

In this section we report on applying the com- 
puter algebra system Maple 14 on solving our 
linear systems. Table [6] shows timeg^for solving 
the symmetry conditions (jSJ) with the MAPLE 
standard solve command in column (I) and 
with the SolveTools : -Linear command in col- 

7 Computations were run on one Xeon 2.4 GHz core 
of a 16 core node with 128 GB memory 



umn (J) and the times for solving the combined 
systems of additional conditions (j7J) and sym- 
metry conditions (jSJ) in columns (K) and (L). 

We see that for small systems, solve and 
SolveTools : -Linear take similar times. For 
larger systems (N = 11) SolveTools : -Linear 
is 9 times as fast and even bigger systems could 
not be solve with the command solve within a 
week. 

Another observation is that neither solve 
nor SolveTools : -Linear can take advantage 
of additional necessary equations except for 
n = 12 in column (L) to a small extent. The 
best times for Maple 14 in column (J) of ta- 
ble M and LSSS (Reduce) in column (D) in 
table H] have been measured on the same nodes 
of the same cluster and can be compared. To 
explain the 11786/927=12.7 times faster per- 
formance of LSSS for symmetries of degree 
13 the following test has been made. Table [7] 
shows the times in sec for Maple 14 procedure 
SolveTools : -Linear to solve 
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n 
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(JNJ 




(+J 


(Q): read 


f\/\ 1 1 r\ i td i r\\\ 




unsimp. 


unsorted 


sorted 


1 . p 

simplify 


simplified sys. 


£ -l £ J 

factor of speedup 
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.024 


.0012 


.0014 





.010 


2.1 


4 


.089 


.0084 


.0084 


.01 


.012 


2.9 


5 


.20 


.011 


.0092 


.01 


.012 


6.4 


6 


.86 


.036 


.037 


.03 


.016 


10 


7 


3.0 


.047 


.047 


.09 


.021 


19 


8 


13 


.19 


.22 


.31 


.037 


23 


9 


73 


.41 


.38 


1.04 


.060 


49 


10 


262 


2.6 


2.7 


3.44 


.118 


42 


11 


871 


4.7 


4.9 


11.2 


.228 


53 


12 


3527 


32 


31 


31.7 


.443 


56 


1 Q 

lo 


I 1 VSfi 

II / oo 




1 1 


O/ .0 


.y4U 


1 A 


14 




402 


387 




1.574 





Table 7: Maple 14 times 

• the unsimplified system (EJ in column (M) 
(=column (J) of table [6]), 

• the same system simplified after 1-term 
equations have been solved repeatedly in 
column (N), 

• and then in addition all equations being 
sorted by size, shortest first in column (O). 

The cost of these simplifications using the 
Reduce procedures FindZeros to find and 
utilize all 1-term equations repeatedly and 
LengthSort to sort the remaining system 
by size is shown in column (P). The time for 
Maple 14 to read the simplified system is 
given in column (Q). The rightmost column 
gives the factor of speedup of Maple if it would 
use at first the Reduce procedures to sim- 
plify the system before solving it. The potential 
speedup of Maple is impressive and increasing 
with increasing size of the system. 

7 Comparison with LinBox 

As described on its web page [16] LinBox 
is a C++ template library for exact, high- 
performance linear algebra computations with 
dense, sparse, and structured matrices over the 
integers and over finite fields. 



for pre-simplified systems 

Because the solution of our linear system (J5]) 
contains free parameters, LinBox can not be 
applied straightforwardly to solve our system. 
However, what can be computed easily is the 
rank of these systems. Table [8] compares the 
best times of Maple solving the system ([5]) 
(column (J) in table [6]) with the best times 
of Reduce achieved when solving ([5]) and us- 
ing additional necessary conditions (E3) and with 
LinBox at least determining the dimension of 
the nullspace of ([5]) and of ([5]) together with 
(CO). The timings of LinBox include reading of 
the sparse coefficient matrix of the linear sys- 
tem which is the only way to enter data into 
LinBox. 

8 The procedures 

The following four procedures are available for 
the formulation and solution of linear algebraic 
system with an emphasis on selection systems 
which have many zero variables in their solu- 
tion. 

The procedure LSSS can solve linear systems 
with an arbitrary number of equations and un- 
knowns, i.e. the linear system can be under- 
or overdetermined. In case the linear system is 
inconsistent a message will be printed. The un- 
knowns to be solved for must be ordered ahead 
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.09 
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.02 


.02 


.03 


5 


.20 




.13 


.12 


.08 


6 


.86 


30.6 


.90 


1.1 


.26 


7 


3 




12.9 


14.9 


.77 


8 


13 


3080 


210 


283.5 


2.7 


9 


73 




1812 


2318 


9.1 


10 


262 




21210 


21610 


30 


11 


871 








96 


12 


3527 








302 


13 


11786 








927 


14 










2284 












loot 


16 










27970 



Table 8: A comparison of times of Maple, LinBox and Reduce 



of the rest of the variables. The coefficients of 9 Summary 

the linear variables can be rational functions of 

parameters. 



The procedure FindZeros repeatedly picks 
all 1-term equations from an input list of equa- 
tions and sets the value cell of the vanishing 
variables to nil. It returns a list of vanishing 
variables and list of remaining equations. 

The procedure LengthSort effectively sorts 
a list of equations by their size. 

The procedure PruneZeros drops all terms 
of the input expression which should be zero 
due to an earlier run of LSSS or FindZeros. 

The procedures LSSS, FindZero and 
LengthSort are freely available from pU] 
where more details about syntax are given. 
These procedures are also included as module in 
the open source computer algebra system Re- 
duce. The server at [10] contains also files for 
all linear systems for symmetries of degree 3 to 
15 in REDUCE format and in LinBox format 
which can also be read by Maple. 



A class of sparse linear systems that allows ef- 
ficient solution strategies and that occurs fre- 
quently, for example, in integrability investiga- 
tions, is characterized by the vanishing of many 
of the unknowns in its solution. In association 
with the purpose of such systems we call them 
selection systems. A series of such systems that 
we investigated results from symmetry investi- 
gations of a non-abelian ODE system of Kont- 
sevich. It is demonstrated how the vanishing 
of many variables of a selection system can not 
only be used to speed up the solution of the 
system but also to avoid formulating most of 
the system and to apply the solution of the sys- 
tem to large expressions. It is demonstrated 
how a pre-processor step in which repeatedly 1- 
term equations are identified and applied and 
afterwards the remaining equations are sorted 
by size could speed up Maple routines that are 
specialized in sparse systems by a considerable 
factor. 
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